Loop Parallelization Algorithms: From Parallelism Extraction to Code Generation
نویسندگان
چکیده
In this paper, we survey loop parallelization algorithms, analyzing the dependence representations they use, the loop transformations they generate, the code generation schemes they require, and their ability to incorporate various optimizing criteria such as maximal parallelism detection, permutable loop detection, minimization of synchronizations, easiness of code generation, etc. We complete the discussion by presenting new results related to code generation and loop fusion for a particular class of multidimensional schedules called shifted linear schedules. We demonstrate that algorithms based on such schedules lead to simple codes. q 1998 Elsevier Science B.V. All rights
منابع مشابه
Extracting Statistical Loop-Level Parallelism using Hardware-Assisted Recovery
Chip multiprocessors with multiple simpler cores are gaining popularity because they have the potential to drive future performance gains without exacerbating the problems of power dissipation and hardware complexity. These designs provide real benefits for server-class applications that are explicitly multi-threaded. However, for desktop and other systems, there is a large code base of single-...
متن کاملParallelization of Loops with Variable Distance Data Dependences
The extent of parallelization of a loop is largely determined by the dependences between its statements. While dependence free loops are fully parallelizable, those with loop carried dependences are not. Dependence distance is a measure of absolute difference between a pair of dependent iterations. Loops with constant distance data dependence(CD3), because of uniform distance between the depend...
متن کاملDynamic and Speculative Polyhedral Parallelization of Loop Nests Using Binary Code Patterns
Speculative parallelization is a classic strategy for automatically parallelizing codes that cannot be handled at compile-time due to the use of dynamic data and control structures. Another motivation of being speculative is to adapt the code to the current execution context, by selecting at run-time an efficient parallel schedule. However, since this parallelization scheme requires on-the-fly ...
متن کاملGlobal Instruction Scheduling for Multi-threaded Architectures
Recently, the microprocessor industry has moved toward multi-core or chip multiprocessor (CMP) designs as a means of utilizing the increasing transistor counts in the face of physical and micro-architectural limitations. Despite this move, CMPs do not directly improve the performance of single-threaded codes, a characteristic of most applications. In effect, the move to CMPs has shifted even mo...
متن کاملBouclettes: A Fortran Loop Parallelizer
High Performance Fortran is a dataparallel language that allows the user to specify the parallelism in his program. It is not always easy to extract the parallelism in a given program. To help the user, an automatic loop parallelizer has been developed : Bouclettes. Bouclettes has been written to validate some scheduling and mapping techniques that are mentioned in this paper. A Fortran 77 loop...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Parallel Computing
دوره 24 شماره
صفحات -
تاریخ انتشار 1998